
* 


United States Patent and Trademark Office 


UNITED STATES DEPARTMENT OF COMMERCE 
United States Patent and Trademark Office 
Addrca: COMMISSIONER FDR PATENTS 
P.O. Bo* 1450 

Alexandria, Vugima 2231M450 
www.upio.gov 


APPLICATION NO. 


FILING DATE 


FIRST NAMED INVENTOR 


ATTORNEY DOCKET NO. 


CONFIRMATION NO. 


09/642,452 


08/18/2000 


Josef Bauer 


POO,1701 


7124 


7590 OS/25/2003 

Morrison & Foerster LLP 
1650 Tysons Boulevard 
Suite 300 

McLean, VA 22102 


EXAMINER 


LERNER, MARTIN 


ART UNIT 


PAPER NUMBER 


2654 

DATE MAILED: 08/25/2003 


Please find below and/or attached an Office communication concerning this application or proceeding. 


PTO-90C (Rev. 07-01) 


Office Action Summary 


Application No. 

09/642,452 


Examiner 
Martin Lerner 


Applicant(s) 
BAUER ET AL. 


Art Unit 

2654 


- The MAILING DATE of this communication appears on the cover sheet with the correspondence address - 
Period for Reply 

A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) FROM 
THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1.136(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 

- If the period for reply specified above is less than thirty (30) days, a reply within the statutory minimum of thirty (30) days will be considered timely. 
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earned patent term adjustment. See 37 CFR 1 .704(b). 

Status 

1 )S Responsive to communication(s) filed on 28 December 2000 . 
2a)D This action is FINAL. 2b)[3 This action is non-final. 

3) D Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayle, 1 935 CD. 1 1 , 453 O.G. 21 3. 
Disposition of Claims 

4) ^ Claim(s) 15 to 28 is/are pending in the application. 

4a) Of the above claim(s) is/are withdrawn from consideration. 

5) D Claim(s) is/are allowed. 

6) E3 Claim(s) 15 to 28 is/are rejected. 

7) D Claim(s) is/are objected to. 

8) D Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) D The specification is objected to by the Examiner. 

10)^ The drawing(s) filed on 18 August 2000 is/are: s)M accepted or b)D objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1 .85(a). 
1 1 )□ The proposed drawing correction filed on is: a)D approved b)D disapproved by the Examiner. 

If approved, corrected drawings are required in reply to this Office action. 
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1 Certified copies of the priority documents have been received. 
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application from the International Bureau (PCT Rule 17.2(a)). 
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DETAILED ACTION 


Drawings 


1 . The drawings are objected to because they do not contain word labels for the 
method steps of the flow charts. 

In Figure 1 , the labels "nein" and "ja" should be changed to -yes — and -no — , 
respectively. Also, the understandability of the illustration can be substantially improved 
by inserting appropriate word labels for the method steps S1 to S9 of the flow chart as 
disclosed in the Substitute Specification, Pages 4 to 7. It is conventional for flow charts 
to include English language word labels for patents issued in the United States. 

In Figure 3, the understandability of the illustration can be substantially improved 
by changing the abbreviations for the word labels of the elements to correspond to the 
English language abbreviations of these elements as disclosed in the Substitute 
Specification, Pages 11 to 13. 

A proposed drawing correction or corrected drawings are required in reply to the 
Office action to avoid abandonment of the application. The objection to the drawings 
will not be held in abeyance. 

Specification 

2. The Substitute Specification filed 28 December 2000 has been entered. 

3. The disclosure is objected to because of the following informalities: 
On page 2, line 16, "a training phases" should be -a training phase—. 
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On page 9, line 22, "words that sounds similar" should be -words that sound 
similar—. 

On page 10, line 7, shouldn't "fourth version" be -fifth version — ? The fifth 
version is disclosed to involve an n-best list, but the fourth version evaluates the quality 
of speech. 

Appropriate correction is required. 

Claim Rejections - 35 USC § 102 

4. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in a patent granted on an application for patent by another filed in the 
United States before the invention thereof by the applicant for patent, or on an international application 
by another who has fulfilled the requirements of paragraphs (1 ), (2), and (4) of section 371 (c) of this 
title before the invention thereof by the applicant for patent. 

5. Claims 15, 16, 20 to 24, and 28 are rejected under 35 U.S.C. 102(e) as being 
anticipated by Polikaitis et a/. 

Regarding independent claims 15 and 28, Polikaitis etal. discloses a speech 
recognition method and system, comprising: 

"determining words and pauses in speech on the basis of word boundaries" - 
microprocessor 110 has a speech/noise classifier for determining whether each frame is 
speech or noise; if the classifier identifies a frame as speech the classifier assigns the 
frame an SNflag of 1 ; if the classifier identifies the frame as noise, the classifier assigns 
the frame an SNflag of 0; SNflag is a control value used to classify the frames (column 
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4, lines 31 to 41 : Figure 1); a frame classified as speech corresponds to a "word" and a 
frame classified as noise corresponds to a "pause", where the "word boundary" is the 
transition between speech and background noise; 

"determining an average silence volume during the pauses" - NoiseEnergy is the 
average energy of all the noise frames as designated by an SNflag equal to 0 (column 

5, lines 1 1 to 23); 

"determining an average word volume for the words" - SpeechEnergy is the 
average energy of all speech frames as designated by an SNflag value equal to 1 
(column 5, lines 1 to 10); 

"calculating a difference between the average word volume and the average 
silence volume" - in step 260, microprocessor 110 compares the speech waveform 
parameters to determine whether the user spoke too softly, Error 4; if the ratio ("a 
difference") of SpeechEnergy to NoiseEnergy is less than a sixth threshold value, 
Thresh6, then the speech signal is obscured by noise; while any values may be used for 
Thresh6, Thresh6 is preferably in the range of 6 dB - 24 dB (column 8, lines 46 to 55: 
Figure 2); the comparison of the ratio of SpeechEnergy to NoiseEnergy is a calculation 
of a difference between the average word volume and the average silence volume; a 
ratio represents a "difference" because a larger ratio implies a larger difference and a 
smaller ratio implies a smaller difference; particularly, sound energies are designated by 
decibel levels, so that a ratio of sound energies in decibels corresponds to a subtraction 
of logarithms; a decibel (dB) is defined as "a unit for expressing the ratio of two amounts 
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of electric or acoustic signal power equal to 10 times the common logarithm of this ratio" 
(Merriam-Webster's Dictionary); 

"recognizing speech when the difference between the average word volume and 
the average silence volume is greater than a threshold" - in step 260, microprocessor 
110 compares the speech waveform parameters to determine whether the user spoke 
too softly, Error 4; if the ratio of SpeechEnergy to NoiseEnergy is less than a sixth 
threshold value, Thresh6, then the speech signal is obscured by noise (column 8, lines 
46 to 55: Figure 2); in step 260, if the ratio of SpeechEnergy to NoiseEnergy is greater 
than or equal to Thresh6, then the method proceeds to step 290; at step 290, 
microprocessor 110 performs the speech recognition process on the speech signal for 
transmission of a speech recognition signal to the communication interface circuitry 115 
(column 9, lines 19 to 34: Figure 2); thus, speech recognition is only performed if the 
ratio is greater than Thresh6. 

Regarding claim 16, Polikaitis et a/, discloses that, while any values may be used 
for Thresh6, Thresh6 is preferably in the range of 6 dB - 24 dB (column 8, lines 46 to 
55: Figure 2); a decibel (dB) is defined as "a unit for expressing the ratio of two amounts 
of electric or acoustic signal power equal to 10 times the common logarithm of this ratio" 
(Merriam-Webster's Dictionary); thus, Polikaitis et al. discloses implicitly that 
SpeechEnergy and NoiseEnergy are also measured in decibels, which are logarithmic 
units. 
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Regarding claim 20, Polikaitis et al. discloses Thresh6 is set by the manufacturer 
preferably (column 8, lines 52 to 54); thus Thresh6 is a constant. 

Regarding claim 21 , Polikaitis et al. discloses no speech recognition is performed 
if the ratio SpeechEnergy/NoiseEnergy is less than Thresh6 (column 8, lines 46 to 55: 
Figure 2); instead, an error procedure is performed. 

Regarding claim 22, Polikaitis et al. discloses in step 263, microprocessor 110 
informs the user that Error 4 has occurred; microprocessor 110 communicates Error4 
information via the communication output mechanism - communication interface 
circuitry 115, speaker 135, display 150, and vibrator/buzzer 160; the information may be 
communicated through a single output device or any combination of output devices 
(column 8, lines 55 to 62: Figures 1 and 2); Error4 information output through a speaker 
or display is "a message". 

Regarding claims 23 and 24, Polikaitis et al. discloses if Control 4 is option A, the 
user is prompted in step 270 to repeat the voice instruction and is prompted to speak 
louder (column 9, lines 5 to 8: Figure 2); implicitly, speaking louder causes 
SpeechEnergy ("average word volume") to increase relative to NoiseEnergy ("average 
silence volume") as an increased signal-to-noise ratio ("so that an adequate distance is 
achieved"). 
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Claim Rejections - 35 USC § 103 


6. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

7. Claims 17 to 19 and 25 to 26 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Polikaitis et a/, in view of Wu et al. 

Regarding claim 17, Polikaitis et al. discloses SpeechEnergy is the average 
energy of all speech frames as designated by an SNflag value equal to 1 , and 
NoiseEnergy is the average energy of all the noise frames as designated by an SNflag 
equal to 0, for all frames 1 to M, where M is the total number of frames (column 5, lines 
1 to 23). Thus, SpeechEnergy and NoiseEnergy are global average values, and the 
ratio SpeechEnergy/NoiseEnergy is a global difference of the values in decibels. Also, 
Polikaitis et al. suggests the user may set or change the value of Thresh6 (column 8, 
lines 52 to 55). However, Polikaitis et al. does not expressly disclose adapting 
threshold Thresh6 on the basis of the global difference, although adaptive thresholds 
are fairly well known. Wu et al. teaches a generally similar speech recognition method 
for analyzing endpoints in speech with signal-to-noise ratios, where speech recognition 
is only performed if a predetermined restart threshold level is identified. (Column 9, 
Line 56 to Column 10, Line 5) Wu et al. employs adaptive thresholds, T s , T e , T sr , T er , 
defined in terms of an average background noise level N bg , and average speech energy 
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levels, Eis and E /e . (Column 7, Line 25 to Column 9, Line 31 : Figures 8, 9(a) and 9(b)) 
Specifically, Wu et a/, says the method is advantageous for eliminating errors due to 
mistaking breathing for actual speech. (Column 9, Line 56 to Column 10, Line 5) It 
would have been obvious to one having ordinary skill in the art to employ adaptive 
thresholds defined in term of average speech energy and average noise energy as 
suggested by Wu et ai for the Thresh6 of Polikaitis et a/, in order to eliminate errors due 
to mistaking breathing for actual speech. 

Regarding claim 18 Wu etai discloses the thresholds are related to the signal- 
to-noise ratios, defined in terms of differences £/ s - hi bg and E le - (column 8, lines 24 
to 65). 

Regarding claim 19, Wu et ai discloses general formulae for adaptive thresholds 
T sr and T er , where the thresholds are diminished by a factor -C3 Nbg, where C3 is a 
constant to account for conditions of unstable background noise (column 9, lines 20 to 
31). 

Regarding claim 25, Polikaitis et ai discloses SpeechEnergy is the average 
energy of all speech frames as designated by an SNflag value equal to 1 , and 
NoiseEnergy is the average energy of all the noise frames as designated by an SNflag 
equal to 0, for all frames 1 to M, where M is the total number of frames (column 5, lines 
1 to 23), Thus, SpeechEnergy and NoiseEnergy are global average values, and 
average noise is not measured for individual pauses, with the result that the difference 
between average word volume and average silence volume is not measured in terms of 
individual preceding or following silence energy values. However, Wu et al. teaches a 
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generally similar speech recognition method for analyzing endpoints in speech with 
signal-to-noise ratios, where speech is recognition is only performed if a predetermined 
restart threshold level is identified. (Column 9, Line 56 to Column 10, Line 5) Wu et ai 
determines an average background noise level H bg on the basis of segments of silence 
energy defining a reliable island. (Column 7, Lines 25 to 42: Figure 8) Similarly, Wu et 
ai determines average speech energy levels, E /s and E /e , on the basis of segments of 
speech energy defining a reliable island. (Column 7, Line 58 to Column 8, Line 23: 
Figures 9(a) and 9(b)). Wu et ai says the method is advantageous for eliminating 
errors due to mistaking breathing for actual speech. (Column 9, Line 56 to Column 10, 
Line 5) It would have been obvious to one having ordinary skill in the art to determine a 
difference between average speech energy and average noise energy in terms of 
individual preceding or following pauses as suggested by Wu et ai instead of the global 
average speech energy and global average noise energy of Polikaitis et ai for the 
purpose of eliminating errors due to mistaking breathing for actual speech. 

Regarding claim 26, Polikaitis et ai discloses SpeechEnergy is the average 
energy of all speech frames, and NoiseEnergy is the average energy of all the noise 
frames, for all frames 1 to M, where M is the total number of frames (column 5, lines 1 
to 23). Polikaitis et ai discloses NoiseEnergy is a global average value, but omits 
defining the average silence on the basis of a plurality of successive pauses. Wu et ai 
determines an average background noise level, N bg , on the basis of segments of silence 
energy defining a reliable island, and similarly, determines average speech energy 
levels, Ef S and E /e , on the basis of segments of speech energy defining a reliable island. 
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(Column 7, Line 25 to Column 8, Line 23: Figures 8, 9(a), and 9(b)). Wu et ai says the 
method is advantageous for eliminating errors due to mistaking breathing for actual 
speech. (Column 9, Line 56 to Column 10, Line 5) It would have been obvious to one 
having ordinary skill in the art to combine the segmental energy averaging method of 
Wu et ai with the global energy averaging method of Polikaitis et ai so as to determine 
the global average silence energy on the basis of a sum of the energies of successive 
silence segments for the purpose of eliminating errors due to mistaking breathing for 
actual speech. 

8. Claim 27 is rejected under 35 U.S.C. 103(a) as being unpatentable over Polikaitis 
et ai in view of Wu et ai as applied to claims 20 to 26 above, and further in view of 
Hamasaki et ai 

Polikaitis et ai omits preparing an n-best list on the basis of the difference 
between the average word volume of individual words, and determining the word to be 
inserted into the text according to a criterion of the difference between the average word 
volume and the average silence volume of the individual spoken words. However, 
Hamasaki et ai teaches a similar speech recognition method, where a signal-to-noise 
ratio is calculated from the logarithm of the average power of a speech segment and the 
logarithm of the average noise power. (Column 4, Lines 45 to 62: Figure 6) A 
recognition candidate determiner 14 determines the number of candidates to present in 
an n-best list varying according to the value of the SN ratio with respect to a threshold 
x p . (Column 3, Line 21 to Column 4, Line 24; Column 6, Line 43 to Column 7, Line 33: 
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Figures 3 and 4) Implicitly, the highest scoring word is inserted into the text. Hamasaki 
et ai says the speech recognition method has the advantage of improving a recognition 
rate by including words in an n-best list that might be eliminated from the list due to a 
low signal-to-noise ratio. (Column 2, Lines 5 to 49) It would have been obvious to one 
having ordinary skill in the art to include the speech recognition method of presenting 
the number of word candidates in an n-best list depending on the value of the signal-to- 
noise ratio as suggested by Hamasaki et ai in the related speech recognition method of 
Polikaitis et a/, for the purpose of improving recognition accuracy in the presence of 
noise. 

Conclusion 

9. The prior art made of record and not relied upon is considered pertinent to 
Applicants 1 disclosure. 

Malah, Sato et al. f Nguyen, Muroi, Brown et al., Walker, Pastor, Nakagawa et al., 
Aktas et al., and Gerson et al. disclose related art. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Martin Lerner whose telephone number is (703) 308- 
9064. The examiner can normally be reached on 8:30 AM to 6:00 PM Monday to 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (703) 305-9645. The fax phone 
numbers for the organization where this application or proceeding is assigned are (703) 
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872-9314 for regular communications and (703) 872-9315 for After Final 
communications. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is (703) 305- 
4700. 
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